This file contains the full R workflow for processing and analyzing raw data, as well as generating figures for the manuscript. The Rmarkdown file can be downloaded from the Code drop down menu (top right).
Overview
This document presents a comprehensive analysis for developing predictive models of Stagnospora nodorum blotch (SNB) disease severity in winter wheat across North Carolina. We compare Bayesian hierarchical modeling approaches with machine learning methods (XGBoost) to identify key environmental and management factors influencing disease development.
Objectives
Develop robust predictive models for SNB severity using multi-year, multi-location field data
Compare Bayesian hierarchical GLMs with XGBoost machine learning approaches
Identify critical pre-anthesis weather variables and management practices affecting disease
Study design
Data Collection: Field trials conducted across multiple sites in North Carolina over multiple growing seasons
Validation Strategy: - CV0 (held out validation): cross-validation using four randomly selected sites completely held out for testing
Load packages
Code
# Statistical modelinglibrary(brms) # Bayesian regression modelslibrary(tidymodels) # Machine learning frameworklibrary(marginaleffects) # Marginal effects and predictions# Machine learninglibrary(xgboost) # Gradient boostinglibrary(vip) # Variable importance plotslibrary(caret) # Data processing# Data manipulation and visualizationlibrary(tidyverse) # Data wranglinglibrary(data.table) # Fast data operationslibrary(patchwork) # Combine plots# Bayesian toolslibrary(tidybayes) # Tidy Bayesian analysislibrary(extraDistr) # Additional distributions# Visualizationlibrary(scales) # Scale functions for ggplot2library(ggbeeswarm) # Beeswarm plotslibrary(gghalves) # Half-half plotslibrary(ggtext) # Enhanced text rendering# Clean workspacerm(list =ls())# Set default themetheme_set(theme_bw(base_size =10))
0.1 Data Preparation
Load and transform SNB data
Code
# Load disease severity dataload("data/SNB.RData") # BLUEs (Best Linear Unbiased Estimates) for each disease case# Apply Smithson-Verkuilen transformation to handle boundary values This# transformation converts proportions with 0 and 1 values to open interval# (0,1) Formula: (y*(n-1) + 0.5) / n, where y is original proportionn =nrow(SNB)SNB$sev =round((SNB$sev/100* (n -1) +0.5)/n, 3)# Ordering levelsSNB$site =factor(SNB$site)SNB$region =factor(SNB$region)SNB$residue =factor(SNB$residue, levels =c("N", "Y"))SNB$resistance =factor(SNB$resistance, levels =c("MR", "MS", "S"))SNB$snb_onset =factor(SNB$snb_onset, levels =c("post_anthesis", "pre_anthesis"))
Data was originally organized in Rdata. The SNB.RData contains data on the outcome (sev_lsm), which is continuous with support ]0,1[, and several plot-level covariates. Plot-level covariates include cultivar (a categorical variable indicating susceptible [S], moderately susceptible [MS], and moderately resistant [MR]), residue (whether is present or not), site, and sigma (the residual standard error for each trial), and whether SNB onset was before or after anthesis, and region (a third group-level).
Code
skimr::skim(SNB)
Data summary
Name
SNB
Number of rows
151
Number of columns
7
_______________________
Column type frequency:
factor
5
numeric
2
________________________
Group variables
None
Variable type: factor
skim_variable
n_missing
complete_rate
ordered
n_unique
top_counts
site
0
1
FALSE
18
CL2: 10, KS2: 10, PY2: 10, RW2: 10
residue
0
1
FALSE
2
Y: 78, N: 73
resistance
0
1
FALSE
3
MS: 70, S: 46, MR: 35
snb_onset
0
1
FALSE
2
pre: 79, pos: 72
region
0
1
FALSE
3
Pie: 71, Sou: 52, Mid: 28
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
sev
0
1
0.08
0.08
0.0
0.02
0.05
0.11
0.46
▇▂▁▁▁
sigma
0
1
1.34
0.66
0.3
0.90
1.20
1.70
2.90
▅▇▃▁▂
0.2 Exploratory data analysis
0.2.1 Severity distribution
Code
sum(SNB$sev <0.03)/nrow(SNB)
[1] 0.3509934
Code
sum(SNB$sev >0.2)/nrow(SNB)
[1] 0.09271523
Note large number of lower than 3% severity cases
0.2.2 Wheat residue
Code
table(SNB$residue)
N Y
73 78
Balanced number of cases for residue
The presence of P. nodorum-infested wheat residue consistently increases disease severity, confirming the importance of residue management as a cultural control strategy.
0.2.3 Cultivar resistance (Reaction classes to SNB)
Code
table(SNB$resistance)
MR MS S
35 70 46
Code
table(SNB$resistance)/nrow(SNB)
MR MS S
0.2317881 0.4635762 0.3046358
Higher number of moderately susceptible and lowest number of observations with resistant cultivars.
Another factor associated with disease intensity is cultivar resistance. Resistant (R) cultivars have an genetic architecture that prevents fungi colonization and development. Moderate (M) and susceptible (S) cultivars are more prone to disease development.
0.2.4 SNB onset timing
Code
# How many observations report SNB onset?table(SNB$snb_onset)
post_anthesis pre_anthesis
72 79
Code
table(SNB$snb_onset)/nrow(SNB)
post_anthesis pre_anthesis
0.4768212 0.5231788
The lowest number of observations with SNB onset occurred during vegetative stages of the wheat crop.
The earlier infection, the higher values of disease severity. There are some remarkable differences in the densities.
Note that pre_anthesis_variables variables do not contain “-”, because they are collected pre-anthesis. Negative signs on variable names means they involve calculations with LAG post anthesis in our code.
ETo R RH RH6.peak4 T T6.peak4 TR TRH
2 11 19 10 6 8 29 32
intra
24h dawn daytime dusk nighttime
12 20 26 39 20
To rigorously assess model generalizability to new locations, we implement a spatial hold-out strategy where entire sites are excluded from model training.
Code
# CV0 sites. Previously, I said there were a sample of sites removed from# window pane. These sites are not going to be used for either training or# regular testing, only CV0 testing. Here the sites must be# 'OX23','LB24','SB24','CL22'.set.seed(125)out_sites =sample(unique(SNB$site), 4, replace = F) |>droplevels()out_sites
tree_depth min_n loss_reduction sample_size
Min. :1.00 Min. : 2.00 Min. : 0.00000 Min. :0.100
1st Qu.:1.75 1st Qu.:10.75 1st Qu.: 0.00000 1st Qu.:0.325
Median :2.50 Median :20.50 Median : 0.00006 Median :0.550
Mean :2.50 Mean :20.54 Mean : 1.38796 Mean :0.550
3rd Qu.:3.25 3rd Qu.:30.25 3rd Qu.: 0.04264 3rd Qu.:0.775
Max. :4.00 Max. :40.00 Max. :31.62278 Max. :1.000
mtry learn_rate
Min. : 1.00 Min. :0.000e+00
1st Qu.: 30.75 1st Qu.:2.000e-08
Median : 61.50 Median :3.190e-06
Mean : 61.51 Mean :5.417e-03
3rd Qu.: 92.25 3rd Qu.:5.661e-04
Max. :123.00 Max. :1.000e-01
# Finalize workflow with best hyperparametersfinal_xgb =finalize_workflow(xgb_wf, select_best(xgb_res, metric ="rmse"))# Fit to training data and evaluate on test setfinal_fit_xgb =last_fit(final_xgb, splits)final_xgb
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Formula
Model: boost_tree()
── Preprocessor ────────────────────────────────────────────────────────────────
sev ~ .
── Model ───────────────────────────────────────────────────────────────────────
Boosted Tree Model Specification (regression)
Main Arguments:
mtry = 53
trees = 1000
min_n = 11
tree_depth = 2
learn_rate = 0.0455226827550731
loss_reduction = 1.66540163750322e-06
sample_size = 0.943037974683544
Computational engine: xgboost
Code
collect_metrics(final_fit_xgb)
1.5 Prediction performance
2 Bayesian hierarchical modeling
2.1 Setup
2.1.1 Data scaling
Code
# Get weather predictors that need to be scaledweather_predictors =colnames(SNB_data[grepl("fa", names(SNB_data))])# Scale data based on entire dataset and not only on training datamean_data =colMeans(SNB_data[, weather_predictors], na.rm =TRUE)sd_data =apply(SNB_data[, weather_predictors], 2, sd, na.rm =TRUE)# Scale SNB_trainSNB_train[, weather_predictors] =scale(SNB_train[, weather_predictors], center = mean_data,scale = sd_data)# Scale SNB_testSNB_test[, weather_predictors] =scale(SNB_test[, weather_predictors], center = mean_data,scale = sd_data)# Scale SNB_test0SNB_test0[, weather_predictors] =scale(SNB_test0[, weather_predictors], center = mean_data,scale = sd_data)
Here we must scale the Weather predictors for training and testing using the entire dataset mean and SD, not just a subset.
Note that matrices only contain environments in the training dataset and not in the testing datasets
2.1.4 Beta regression
The beta distribution is a family of continuous probability distribution defined on the interval [0, 1] or (0, 1) in terms of two positive parameters denoted by alpha (\(\alpha\)) and beta (\(\beta\)), that appear as exponents of the variable and its complement to 1 (thus its suitability to model proportions), respectively, and control the shape of the distribution.
Here is an example to look at the distribution of scores for this test where most people get 6/10, or 60%, we can use 6 and 4 as parameters. Most people score around 60%, and the distribution isn’t centered—it’s asymmetric.
The magic of—and most confusing part about—beta distributions is that you can get all sorts of curves by just changing the shape parameters. To make this easier to see, we can make a bunch of different beta distributions.
2.1.5 Priors
Residue: 1.8 ± 0.4
Resistance S: 1.6 ± 0.4
Resistance MS: 1.1 ± 1.3
Onset: 0.8 ± 2.3
Intercept: -3.9
Priors for categorical agronomic predictors were derived by converting expected percentage-point effects into their corresponding logit-scale values under a baseline severity of 2% (no residue and MR cultivar). Because the logit transformation is steep near zero, modest absolute changes in severity (for example, 4–10%) map to comparatively large effects on the linear predictor, yielding prior means of 1.2–1.8 for residue and resistance classes. The associated standard deviations (0.3–1.1) reflect the empirical uncertainty around these effects, with wider values producing weaker priors.
2.2 Model Development
2.2.1 M1
The simplest model including only agronomic and host factors without accounting for site-specific variation or weather effects.
Formula:
sev ~ residue + snb_onset + resistance
Key Features: - Fixed effects for crop residue, disease onset timing, and cultivar resistance - No random effects or weather variables - Establishes baseline prediction performance
Code
M1 =brm(bf(sev ~ residue + snb_onset + resistance), data = SNB_train, family =Beta(link ="logit"),prior =c(set_prior("normal(1.8, 0.4)", class ="b", coef ="residueY"), set_prior("normal(1.6, 0.4)",class ="b", coef ="resistanceS"), set_prior("normal(1.1,1.3)", class ="b",coef ="resistanceMS"), set_prior("normal(0.8, 2.3)", class ="b", coef ="snb_onsetpre_anthesis"),set_prior("normal(-3.9, 5)", class ="Intercept"), set_prior("gamma(2, 0.01)",class ="phi")), chains = chain, iter = iter, warmup = warmup, cores = cores,seed = seed, control = control, file ="results/models/M1")
Family: beta
Links: mu = logit
Formula: sev ~ residue + snb_onset + resistance
Data: SNB_train (Number of observations: 97)
Draws: 4 chains, each with iter = 10000; warmup = 2000; thin = 1;
total post-warmup draws = 32000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept -3.97 0.20 -4.37 -3.59 1.00 17918
residueY 0.94 0.18 0.59 1.28 1.00 18580
snb_onsetpre_anthesis 0.60 0.19 0.23 0.97 1.00 18281
resistanceMS 0.35 0.19 -0.01 0.73 1.00 18085
resistanceS 0.71 0.18 0.35 1.07 1.00 17548
Tail_ESS
Intercept 18849
residueY 21223
snb_onsetpre_anthesis 21229
resistanceMS 17580
resistanceS 18415
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 24.27 3.66 17.61 31.99 1.00 19415 19670
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Click to view Stan code for M1
Code
cat(make_stancode(M1))
// generated with brms 2.23.0functions {}data {int<lower=1> N; // total number of observationsvector[N] Y; // response variableint<lower=1> K; // number of population-level effectsmatrix[N, K] X; // population-level design matrixint<lower=1> Kc; // number of population-level effects after centeringint prior_only; // should the likelihood be ignored?}transformed data {matrix[N, Kc] Xc; // centered version of X without an interceptvector[Kc] means_X; // column means of X before centeringfor (i in2:K) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; }}parameters {vector[Kc] b; // regression coefficientsreal Intercept; // temporary intercept for centered predictorsreal<lower=0> phi; // precision parameter}transformed parameters {// prior contributions to the log posteriorreal lprior = 0; lprior += normal_lpdf(b[1] | 1.8, 0.4); lprior += normal_lpdf(b[2] | 0.8, 2.3); lprior += normal_lpdf(b[3] | 1.1,1.3); lprior += normal_lpdf(b[4] | 1.6, 0.4); lprior += normal_lpdf(Intercept | -3.9, 5); lprior += gamma_lpdf(phi | 2, 0.01);}model {// likelihood including constantsif (!prior_only) {// initialize linear predictor termvector[N] mu = rep_vector(0.0, N); mu += Intercept + Xc * b; mu = inv_logit(mu);target += beta_lpdf(Y | mu .* phi, (1 - mu) .* phi); }// priors including constantstarget += lprior;}generated quantities {// actual population-level interceptreal b_Intercept = Intercept - dot_product(means_X, b);}
Click to view priors for M1
Code
prior_summary(M1)
2.2.2 M2 - Random intercepts
Extends M1 by adding random site-level intercepts to account for unmeasured site-specific factors (e.g., inoculum pressure, microclimate, soil conditions).
Formula:
sev ~ residue + snb_onset + resistance + (1|site)
Key Features: - Hierarchical structure with site-level random effects - Accounts for pseudo-replication within sites - Partial pooling of site estimates
Code
M2 =brm(bf(sev ~ residue + snb_onset + resistance + (1| site)), data = SNB_train,family =Beta(link ="logit"), prior =c(set_prior("normal(1.8, 0.4)", class ="b",coef ="residueY"), set_prior("normal(1.6, 0.4)", class ="b", coef ="resistanceS"),set_prior("normal(1.1,1.3)", class ="b", coef ="resistanceMS"), set_prior("normal(0.8, 2.3)",class ="b", coef ="snb_onsetpre_anthesis"), set_prior("normal(-3.9, 5)",class ="Intercept"), set_prior("gamma(2, 0.01)", class ="phi"), set_prior("cauchy(0, 2.5)",class ="sd")), chains = chain, iter = iter, warmup = warmup, cores = cores,seed = seed, control = control, file ="results/models/M2")
Family: beta
Links: mu = logit
Formula: sev ~ residue + snb_onset + resistance + (1 | site)
Data: SNB_train (Number of observations: 97)
Draws: 4 chains, each with iter = 10000; warmup = 2000; thin = 1;
total post-warmup draws = 32000
Multilevel Hyperparameters:
~site (Number of levels: 14)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.94 0.21 0.62 1.45 1.00 5825 11098
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept -4.34 0.28 -4.89 -3.80 1.00 5611
residueY 1.53 0.10 1.33 1.73 1.00 17136
snb_onsetpre_anthesis 0.00 0.11 -0.21 0.22 1.00 16952
resistanceMS 0.24 0.09 0.07 0.43 1.00 18415
resistanceS 1.01 0.10 0.82 1.19 1.00 18706
Tail_ESS
Intercept 9408
residueY 19946
snb_onsetpre_anthesis 20002
resistanceMS 20507
resistanceS 20266
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 150.03 22.99 108.29 198.22 1.00 17420 19359
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Click to view Stan code for M2
Code
cat(make_stancode(M2))
// generated with brms 2.23.0functions {}data {int<lower=1> N; // total number of observationsvector[N] Y; // response variableint<lower=1> K; // number of population-level effectsmatrix[N, K] X; // population-level design matrixint<lower=1> Kc; // number of population-level effects after centering// data for group-level effects of ID 1int<lower=1> N_1; // number of grouping levelsint<lower=1> M_1; // number of coefficients per levelarray[N] int<lower=1> J_1; // grouping indicator per observation// group-level predictor valuesvector[N] Z_1_1;int prior_only; // should the likelihood be ignored?}transformed data {matrix[N, Kc] Xc; // centered version of X without an interceptvector[Kc] means_X; // column means of X before centeringfor (i in2:K) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; }}parameters {vector[Kc] b; // regression coefficientsreal Intercept; // temporary intercept for centered predictorsreal<lower=0> phi; // precision parametervector<lower=0>[M_1] sd_1; // group-level standard deviationsarray[M_1] vector[N_1] z_1; // standardized group-level effects}transformed parameters {vector[N_1] r_1_1; // actual group-level effects// prior contributions to the log posteriorreal lprior = 0; r_1_1 = (sd_1[1] * (z_1[1])); lprior += normal_lpdf(b[1] | 1.8, 0.4); lprior += normal_lpdf(b[2] | 0.8, 2.3); lprior += normal_lpdf(b[3] | 1.1,1.3); lprior += normal_lpdf(b[4] | 1.6, 0.4); lprior += normal_lpdf(Intercept | -3.9, 5); lprior += gamma_lpdf(phi | 2, 0.01); lprior += cauchy_lpdf(sd_1 | 0, 2.5) - 1 * cauchy_lccdf(0 | 0, 2.5);}model {// likelihood including constantsif (!prior_only) {// initialize linear predictor termvector[N] mu = rep_vector(0.0, N); mu += Intercept + Xc * b;for (n in1:N) {// add more terms to the linear predictor mu[n] += r_1_1[J_1[n]] * Z_1_1[n]; } mu = inv_logit(mu);target += beta_lpdf(Y | mu .* phi, (1 - mu) .* phi); }// priors including constantstarget += lprior;target += std_normal_lpdf(z_1[1]);}generated quantities {// actual population-level interceptreal b_Intercept = Intercept - dot_product(means_X, b);}
Click to view priors for M2
Code
prior_summary(M2)
2.2.3 M3: Environmental similarity
Relaxes the assumption of independence between group-level intercepts by introducing an environmental similarity matrix.
Formula:
sev ~ residue + snb_onset + resistance + (1|gr(site, cov=euclimat))
Key Features: - Environmental similarity matrix from climate data - Improved prediction for held-out sites via information borrowing
Family: beta
Links: mu = logit
Formula: sev ~ residue + snb_onset + resistance + (1 | gr(site, cov = euclimat))
Data: SNB_train (Number of observations: 97)
Draws: 4 chains, each with iter = 10000; warmup = 2000; thin = 1;
total post-warmup draws = 32000
Multilevel Hyperparameters:
~site (Number of levels: 14)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.13 0.26 0.74 1.75 1.00 6481 12174
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept -4.25 0.68 -5.61 -2.90 1.00 6384
residueY 1.53 0.10 1.33 1.74 1.00 20954
snb_onsetpre_anthesis 0.00 0.11 -0.22 0.22 1.00 21672
resistanceMS 0.25 0.09 0.07 0.43 1.00 20661
resistanceS 1.01 0.10 0.82 1.20 1.00 21459
Tail_ESS
Intercept 10170
residueY 20945
snb_onsetpre_anthesis 22082
resistanceMS 21903
resistanceS 21435
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 149.70 23.21 107.66 198.36 1.00 22151 21377
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Click to view Stan code for M3
Code
cat(make_stancode(M3))
// generated with brms 2.23.0functions {}data {int<lower=1> N; // total number of observationsvector[N] Y; // response variableint<lower=1> K; // number of population-level effectsmatrix[N, K] X; // population-level design matrixint<lower=1> Kc; // number of population-level effects after centering// data for group-level effects of ID 1int<lower=1> N_1; // number of grouping levelsint<lower=1> M_1; // number of coefficients per levelarray[N] int<lower=1> J_1; // grouping indicator per observationmatrix[N_1, N_1] Lcov_1; // cholesky factor of known covariance matrix// group-level predictor valuesvector[N] Z_1_1;int prior_only; // should the likelihood be ignored?}transformed data {matrix[N, Kc] Xc; // centered version of X without an interceptvector[Kc] means_X; // column means of X before centeringfor (i in2:K) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; }}parameters {vector[Kc] b; // regression coefficientsreal Intercept; // temporary intercept for centered predictorsreal<lower=0> phi; // precision parametervector<lower=0>[M_1] sd_1; // group-level standard deviationsarray[M_1] vector[N_1] z_1; // standardized group-level effects}transformed parameters {vector[N_1] r_1_1; // actual group-level effects// prior contributions to the log posteriorreal lprior = 0; r_1_1 = (sd_1[1] * (Lcov_1 * z_1[1])); lprior += normal_lpdf(b[1] | 1.8, 0.4); lprior += normal_lpdf(b[2] | 0.8, 2.3); lprior += normal_lpdf(b[3] | 1.1,1.3); lprior += normal_lpdf(b[4] | 1.6, 0.4); lprior += normal_lpdf(Intercept | -3.9, 5); lprior += gamma_lpdf(phi | 2, 0.01); lprior += cauchy_lpdf(sd_1 | 0, 2.5) - 1 * cauchy_lccdf(0 | 0, 2.5);}model {// likelihood including constantsif (!prior_only) {// initialize linear predictor termvector[N] mu = rep_vector(0.0, N); mu += Intercept + Xc * b;for (n in1:N) {// add more terms to the linear predictor mu[n] += r_1_1[J_1[n]] * Z_1_1[n]; } mu = inv_logit(mu);target += beta_lpdf(Y | mu .* phi, (1 - mu) .* phi); }// priors including constantstarget += lprior;target += std_normal_lpdf(z_1[1]);}generated quantities {// actual population-level interceptreal b_Intercept = Intercept - dot_product(means_X, b);}
Click to view priors for M3
Code
prior_summary(M3)
2.2.4 M4: Selected weather predictors
Introduces six selected pre-anthesis weather variables identified through variable selection procedures. Uses hierarchical structure to separate agronomic and weather effects.
// generated with brms 2.23.0functions {/* compute scale parameters of the R2D2 prior * Args: * phi: local weight parameters * tau2: global scale parameter * Returns: * scale parameter vector of the R2D2 prior */vector scales_R2D2(vector phi, real tau2) {return sqrt(phi * tau2); }}data {int<lower=1> N; // total number of observationsvector[N] Y; // response variableint<lower=1> K_agronomic; // number of population-level effectsmatrix[N, K_agronomic] X_agronomic; // population-level design matrixint<lower=1> Kc_agronomic; // number of population-level effects after centeringint<lower=1> K_rest; // number of population-level effectsmatrix[N, K_rest] X_rest; // population-level design matrixint<lower=1> Kscales_rest; // number of local scale parameters// data for the R2D2 priorreal<lower=0> R2D2_mean_R2_rest; // mean of the R2 priorreal<lower=0> R2D2_prec_R2_rest; // precision of the R2 prior// concentration vector of the D2 priorvector<lower=0>[Kscales_rest] R2D2_cons_D2_rest;// data for group-level effects of ID 1int<lower=1> N_1; // number of grouping levelsint<lower=1> M_1; // number of coefficients per levelarray[N] int<lower=1> J_1; // grouping indicator per observation// group-level predictor valuesvector[N] Z_1_rest_1;int prior_only; // should the likelihood be ignored?}transformed data {matrix[N, Kc_agronomic] Xc_agronomic; // centered version of X_agronomic without an interceptvector[Kc_agronomic] means_X_agronomic; // column means of X_agronomic before centeringfor (i in2:K_agronomic) { means_X_agronomic[i - 1] = mean(X_agronomic[, i]); Xc_agronomic[, i - 1] = X_agronomic[, i] - means_X_agronomic[i - 1]; }}parameters {vector[Kc_agronomic] b_agronomic; // regression coefficientsreal Intercept_agronomic; // temporary intercept for centered predictorsvector[K_rest] zb_rest; // unscaled coefficients// parameters of the R2D2 priorreal<lower=0,upper=1> R2D2_R2_rest;simplex[Kscales_rest] R2D2_phi_rest;real<lower=0> phi; // precision parametervector<lower=0>[M_1] sd_1; // group-level standard deviationsarray[M_1] vector[N_1] z_1; // standardized group-level effects}transformed parameters {vector[K_rest] b_rest; // scaled coefficientsvector<lower=0>[K_rest] sdb_rest; // SDs of the coefficientsreal R2D2_tau2_rest; // global R2D2 scale parametervector<lower=0>[Kscales_rest] scales_rest; // local R2D2 scale parametersvector[N_1] r_1_rest_1; // actual group-level effects// prior contributions to the log posteriorreal lprior = 0;// compute R2D2 scale parameters R2D2_tau2_rest = R2D2_R2_rest / (1 - R2D2_R2_rest); scales_rest = scales_R2D2(R2D2_phi_rest, R2D2_tau2_rest); sdb_rest = scales_rest[(1):(K_rest)]; b_rest = zb_rest .* sdb_rest; // scale coefficients r_1_rest_1 = (sd_1[1] * (z_1[1])); lprior += normal_lpdf(b_agronomic[1] | 1.8, 0.4); lprior += normal_lpdf(b_agronomic[2] | 1.1,1.3); lprior += normal_lpdf(b_agronomic[3] | 1.6, 0.4); lprior += normal_lpdf(b_agronomic[4] | 0.8, 2.3); lprior += normal_lpdf(Intercept_agronomic | -3.9, 5); lprior += beta_lpdf(R2D2_R2_rest | R2D2_mean_R2_rest * R2D2_prec_R2_rest, (1 - R2D2_mean_R2_rest) * R2D2_prec_R2_rest); lprior += gamma_lpdf(phi | 2, 0.01); lprior += cauchy_lpdf(sd_1 | 0, 2.5) - 1 * cauchy_lccdf(0 | 0, 2.5);}model {// likelihood including constantsif (!prior_only) {// initialize linear predictor termvector[N] nlp_agronomic = rep_vector(0.0, N);// initialize linear predictor termvector[N] nlp_rest = rep_vector(0.0, N);// initialize non-linear predictor termvector[N] mu; nlp_agronomic += Intercept_agronomic + Xc_agronomic * b_agronomic; nlp_rest += X_rest * b_rest;for (n in1:N) {// add more terms to the linear predictor nlp_rest[n] += r_1_rest_1[J_1[n]] * Z_1_rest_1[n]; }for (n in1:N) {// compute non-linear predictor values mu[n] = inv_logit(nlp_agronomic[n] + nlp_rest[n]); }target += beta_lpdf(Y | mu .* phi, (1 - mu) .* phi); }// priors including constantstarget += lprior;target += std_normal_lpdf(zb_rest);target += dirichlet_lpdf(R2D2_phi_rest | R2D2_cons_D2_rest);target += std_normal_lpdf(z_1[1]);}generated quantities {// actual population-level interceptreal b_agronomic_Intercept = Intercept_agronomic - dot_product(means_X_agronomic, b_agronomic);}
Click to view priors for M9
Code
prior_summary(M9)
Model comparison
2.2.10 CV0 Performance
We evaluate all models using 5-fold cross-validation on the held-out test sites (CV0). This provides a rigorous assessment of model generalizability to new locations.
2.2.11 Leave-One-Out Cross-Validation
Leave-One-Out Cross-Validation (LOO-CV) provides an information-theoretic comparison of models based on their expected predictive performance. Models are ranked by expected log pointwise predictive density (ELPD).
Code
# Fit Models for LOO Comparisonfit_M1 =add_criterion(M1, "loo")fit_M2 =add_criterion(M2, "loo")fit_M3 =add_criterion(M3, "loo")fit_M4 =add_criterion(M4, "loo")fit_M5 =add_criterion(M5, "loo")fit_M6 =add_criterion(M6, "loo")fit_M7 =add_criterion(M7, "loo")fit_M8 =add_criterion(M8, "loo")fit_M9 =add_criterion(M9, "loo")# LOO Comparisonlootest =loo_compare(fit_M1, fit_M2, fit_M3, fit_M4, fit_M5, fit_M6, fit_M7, fit_M8, fit_M9, criterion ="loo", model_names =NULL)lootest
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 17347287 926.5 24938425 1331.9 24938425 1331.9
Vcells 58620865 447.3 98728295 753.3 98728295 753.3
3 Results: Management and Host Factors
3.1 Wheat residue
3.1.1 Conditional predictions
3.1.2 Marginal effects
3.2 Cultivar reaction class
3.2.1 Conditional predictions
3.2.2 Marginal effects
3.3 Disease onset timing
3.3.1 Conditional predictions
3.3.2 Marginal effects
4 Results: Site effects
Site-Level random intercepts
Sites with positive deviations experienced systematically higher disease severity after accounting for weather and management factors, suggesting persistent local conditions favoring disease (e.g., high background inoculum, favorable microclimate).